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Abstract 

Background: The giant Galapagos tortoise, Chelonoidis nigra, is a large-sized terrestrial chelonian of high patrimonial 
interest. The species recently colonized a small continental archipelago, the Galapagos Islands, where it has been facing 
novel environmental conditions and limited resource availability. To explore the genomic consequences of this 
ecological shift, we analyze the transcriptomic variability of five individuals of C nigra, and compare it to similar 
data obtained from several continental species of turtles. 

Results: Having clarified the timing of divergence in the Clielonoidis genus, we report in C. nigra a very low level of 
genetic polymorphism, signatures of a weakened efficacy of purifying selection, and an elevated mutation load in 
coding and regulatory sequences. These results are consistent with the hypothesis of an extremely low long-term 
effective population size in this insular species. Functional evolutionary analyses reveal a reduced diversity of 
immunity genes in C. nigra, in line with the hypothesis of attenuated pathogen diversity in islands, and an increased 
selective pressure on genes involved in response to stress, potentially related to the climatic instability of its environment 
and its elongated lifespan. Finally, we detect no population structure or homozygosity excess in our five-individual 
sample. 

Conclusions: These results enlighten the molecular evolution of an endangered taxon in a stressful environment 
and point to island endemic species as a promising model for the study of the deleterious effects on genome 
evolution of a reduced long-term population size. 



Background 

Evolution on islands is a fascinating topic. A number of 
plant and animal species are known to be endemic from 
small islands or archipelagos, having evolved in isolation 
from their continental relatives during long periods of time. 
Such systems are typically seen as natural laboratories 
for the study of adaptation [1]. Invading an island means 
entering a new biotic environment, that is, a new commu- 
nity of competitors, predators, preys and parasites, and 
a reduced total amount of available food. This sudden 
ecological challenge must be faced by a supposedly small 
number of migrants, in a context of reduced or null gene 
flow from the mainland. The successful colonization of an 
island by a new species is therefore likely to be driven by 
rapid adaptive evolution. Consistently, evolution on islands 
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is often associated with rapid morphological changes [2], 
the observation of which has been of major importance in 
Darwin s thoughts and conceptions. In the genomic era, 
the search for the molecular targets of such adaptive 
processes appears as a promising quest. 

A second reason why island endemic species are of 
specific interest to evolutionary biologists is their sup- 
posedly reduced population size. The effective population 
size (A/g) is a central parameter of the population genetic 
theory, which determines the strength of genetic drift, the 
random fluctuation of allele frequencies generation after 
generation. The theory makes a number of important 
predictions regarding the influence of A/g on patterns of 
molecular diversity. First, small populations are expected 
to be genetically less diverse than large populations because 
of the reduced sojourn time of neutral mutations in the 
former. The existing data seem in broad agreement with 
this prediction at a wide phylogenetic scale [3,4]. In 
studies of more restricted taxonomic groups, a relationship 
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between population size predictors and genetic diversity 
has been reported in fish [5], but not in mammals [6] 
or birds [7], despite abundant genetic data in the latter 
two taxa. 

Importantly, genetic drift is also expected to decrease the 
efficiency of natural selection, as it pushes the frequency 
of an allele up and down irrespective of its contribution to 
fitness. Consequently, natural selection in favor of slightly 
advantageous mutations and in disfavor of slightly dele- 
terious mutations is supposed to be less efficient in small 
than in large populations [8]. It was convincingly argued 
that the A/g effect is the major explanation for the differ- 
ence in genome architecture between prokaryotes and 
large organisms [9]. Besides this contrast, it would appear 
important to determine whether the A/g effect on selection 
efficiency is detectable at a more recent phylogenetic 
scale, that is, between closely-related species. In par- 
ticular, whether species affected by a recent drop in 
population size are 'genetically endangered' (that is, suffer 
from an increased load of deleterious mutations) is still 
debated [6,10]. 

To date, empirical evidence regarding the influence of A/g 
on the efficiency of natural selection is not so abundant. 
The most convincing contribution was the report in mam- 
mals of a positive correlation between body mass and the 
ratio of non-synonymous to synonymous substitutions 
(d-^/ds) [11]. An increased d^/ds ratio is expected when 
the efficiency of purifying selection against deleterious 
non-synonymous changes is weakened. The mammalian 
pattern was therefore interpreted as a A/g effect, plausibly 
assuming that populations of large animals tend to be 
smaller than populations of small animals, on average. 
Still in mammals, the evolutionary rate of non-coding 
sequences upstream and downstream of genes was reported 
to be faster in primates than in rodents, which was inter- 
preted in terms of a reduced A/g in primates [12]. 

One problem for testing the population size effect on 
patterns of molecular variation is that we typically have 
no direct measurement of A/g, which in the above-cited 
studies was indirectly approached through various surro- 
gates (for example, body mass, conservation status, marine 
vs. terrestrial habitat). The long-term average A/g, which is 
the relevant parameter in molecular evolution, may be 
badly predicted by current species abundance [6,7]. Islands 
provide a good opportunity to cope with these problems: 
island endemic species are likely to have evolved in 
small populations during long periods of time, owing 
to the overall limitation in space and resource availability. 
Consistent with this prediction, an increased d-^/ds ratio 
for mitochondrial genes was reported in island endemic 
species, as compared to their mainland relatives [13,14], 
which was again interpreted as reflecting a weakened 
efficiency of purifying selection due to a smaUer long- 
term Nq. 



The signature of a reduced A/g in the islands, although 
significant, was only observed in approximately 60% of 
the island/mainland couples [14], perhaps because of a 
lack of power - just one or two genes were analyzed for 
each species pair (and see reference [15]). It should also 
be noted that the d-^/ds ratio, used in most of the above- 
mentioned studies, is not only determined by the strength 
of purifying selection, but also by the rate of adaptive 
amino-acid substitutions, which in some species can be 
far from negligible [16]. If adaptation on islands was a 
prominent process at the molecular level too, then the 
d^/ds ratio might be inflated independently of demographic 
effects. In this case, the ratio of non-synonymous over 
synonymous polymorphism (hn/hs), not divergence, would 
be a more appropriate marker of a reduced A/g, since it is 
much less affected by adaptive evolution than the d^/ds 
ratio [17]. The avaflability of within-species variation 
data is therefore of primary importance to disentangle 
the adaptive V5. non-adaptive forces driving molecular 
evolution on islands. 

Here we present a population genomic study of the 
giant Galapagos tortoise, Chelonoidis nigra, an endangered 
species endemic from the Galapagos archipelago. Mito- 
chondrial DNA (mtDNA) analyses suggested that this 
insular species has been isolated from the South American 
continent during millions of years [18,19]. C. nigra is 
together with the Aldabra tortoise the largest known 
living species of terrestrial turtles, much larger than its 
mainland congenerics, and can live well above 100 years. 
Its new environment, the Galapagos archipelago, is 
affected by strong climatic fluctuations in space and 
time, with some islands being generally quite arid, and all 
of them experiencing long periods of drought associated 
to 'El Nino' southern oscillations [20]. C. nigra is therefore 
a perfect model for the study of adaptation following 
island colonization. On the other hand, its large size 
combined to its endemic status is suggestive of a small 
long-term A/g for this species, which might have favored 
non-adaptive evolution through reinforced genetic drift. To 
test these hypotheses and quantify the relative influence of 
positive V5. purifying selection, the transcriptomic diversity 
of five individuals was investigated and compared to similar 
data gathered in several continental species of turtle. 

Results 

Datasets 

Five C. nigra individuals from three distinct subspecies 
were analyzed (Table 1, Additional file 1: Figure SI). 
Besides C. nigra, samples from the congeneric red-footed 
tortoise C. carbonaria and from the Spanish pond tur- 
tle Mauremys leprosa (one individual each) were also 
collected. The dataset was completed by transcriptome 
data from the previously published European pond turtle 
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Table 1 Specimens sampled in this study 



Species 




id2'' 


id3^ 


Sampled in 


mtDNA"^ 


Reads (n) 


Mbp 


C nigra 


GA05A 




SRS509366 


Rotterdam Zoo (Netherlands) 


clade c PBL (becl<i) 


8,995,838 


793 


C. nigra 


GA05G 


zuzlO or zuz20 


SRS509367 


A Cupulatta Corsica (France) 


clade c PBL (becki) 


10,770,970 


987 


C. nigra 


GA05H 


zuzlO or zuz20 


SRS509368 


A Cupulatta Corsica (France) 


clade c PBL (becki) 


10,247,396 


940 


C. nigra 


GA05E 


ZUZ30 


SRS509369 


Zurich Zoo (Switzerland) 


clade d, VA (vandenburghi) 


12,862,334 


1,143 


C. nigra 


GA05F 


zuzOI 


SRS509370 


Zurich Zoo (Switzerland) 


clade d, CRU (porteri) 


4,927,381 


440 


C. carbonaria 


GA05D 




SRS509371 


Montpellier Zoo (France) 




9,218,341 


831 


M. leprosa 


GA03B 




SRS509372 


Banyuls (France) 




9,903,466 


885 



^This study. 

■^According to Russello et al. [21]. 
"^NCBI SRA accession id. 

•^Lineage according to Caccone et al. [18] and Russello et al. [21]. 

Emys orbicularis (10 individuals) and pond slider 
Trachemys scripta (three individuals, Table 2). 

Turtle phylogenomics 

We first analyzed a dataset of 248 orthologous genes in 
eight turtle species. The tree topology we obtained 
(Figure 1) was consistent with published Testudines 
phylogenies [25,26]. Branch lengths in Figure 1 are pro- 
portional to time, whereas branch thickness reflects the 
estimated per million years rate of synonymous substitu- 
tion, obtained by dividing synonymous branch lengths 
by absolute divergence time. The three tropical lineages 
(Phrynops, Pelodiscus, and Chelonoidis) showed a higher 
synonymous substitution rate than the marine (Caretta 
caretta) and temperate ones {Emys, Trachemys, Mauremys), 
consistent with a recent report in turtles of a negative 
correlation between synonymous substitution rate and 
latitude in turtles [27] . 

Colors in Figure 1 reflect lineage-specific d^/ds ratio. 
The ratio was highly heterogeneous across lineages, as 
revealed by the significantly better fit of a model assuming 
one specific d^/ds ratio per branch, compared to a one- 
ratio model (2*delta log likelihood = 1,472, 15 degrees of 
freedom, p-val <10'^^°). Interestingly, the C. nigra branch 
was the one showing the highest d^/ds ratio. This appears 
consistent with the hypothesis of faster non-synonymous 



evolutionary rate in the islands [14]. However, according 
to a phylogenetic analysis of five genes in 32 Testudinoi- 
dea species [28], the divergence between the C. nigra 
and C. carbonaria lineages dates back to approximately 
30 miUion years ago, whereas the colonization of the 
Galapagos archipelago by C. nigra must be younger than 3 
to 4 million years, that is, the age of the oldest Galapagos 
Islands (see Discussion). Whether insularity explains 
the high d^ld^ ratio observed in the C. nigra branch is 
therefore unclear. Within-species data are clearly more 
appropriate to specifically address this issue. 

C. nigra coding sequence polymorphism 

In C. nigra, individual genotypes and single nucleotide 
polymorphisms (SNPs) were called using a probabflistic 
approach paying speciflc attention to the removal of spuri- 
ous SNPs due to hidden paralogy. Various conditions were 
tried regarding the stringency of paralog filtering and 
the minimal number of genotyped individuals required 
to validate a SNP (Table 3, see Methods). Only contigs 
for which an ORF longer than 200 base pairs (bp) was 
predicted and an ortholog was detected in C. carbonaria 
were included in this (and subsequent) analyses. Between 
814 and 1,059 predicted ORFs were analyzed, depending 
on the settings. They yielded 769 to 1,933 coding SNPs, 
of which >99.5% were biallelic. Results were reasonably 



Table 2 Turtle transcriptomic sequence datasets analyzed in this study 



Species 


Family 


Ind 


Tissue 


454 


lllu. 


Contigs^ 


N50 


orf'' 


Ref." 


C nigra 


Testudinidae 


5 


Blood 


yes 


yes 


80,693 


685 


10,929 


(1), (2) 


C. carbonaria 


Testudinidae 


1 


Blood 


no 


yes 


11,324 


536 


1,236 


(1) 


M. leprosa 


Geoemydidae 


1 


Blood 


no 


yes 


10,199 


576 


1,439 


(1) 


E. orbicularis 


Emydidae 


10 


Blood 


yes 


yes 


68,096 


720 


10,503 


(3) 


T. scripta 


Emydidae 


3 


Blood, brain 


yes 


yes 


27,235 


667 


3,383 


(3), (4) 


C. caretta 


Cheloniidae 


1 


Blood 


yes 


no 


31,135 


509 


5,142 


(2) 


P. hilarii 


Chelidae 


1 


Blood 


yes 


no 


45,060 


563 


CO 


(2) 



^Number of contigs of length >200 bp. 
■^Number of ORF of length >100 codons. 

"(1): this study; (2) Chiari et al. [22]; (3) Gayral et al. [23]; (4) Tzika et al. [24]. 
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Figure 1 Maximum-likelihood coding sequence analysis of 248 genes in eight turtle species. Branch lengths are proportional to time. 
Branch thickness: number of synonymous changes per million years per 100 synonymous sites. Colors: branch-specific d^/ds ratio. 



consistent across conditions. Increasing the minimal 
required number of individuals resulted in a moderate 
increase in estimated Hs, and a moderate decrease in 
Hn/hs ratio. Increasing the paralog filtering stringency 
resulted in a decreased Hs, as expected, but hardly affected 
the estimated n^/ns. The sampling variance of these 
estimates was acceptably low: the width of bootstrap 
confidence intervals was 10% to 15% of the Hs average, and 
15% to 20% of the n^/ns one. The results corresponding 
to condition A3 are shown in Figure 2, and compared with 
a similar analysis conducted in the continental European 
pond turtle E, orbicularis (Table 3, last line). 

The average estimated synonymous diversity (ns) in C. 
nigra was 0.0013 to 0.0019, that is, a very low value, 
similar to the human one [6]. No positive FIT was detected, 
meaning that the two alleles of a given individual were 
not more similar, on average, than two alleles sampled 
in two distinct individuals. Likewise, no significant FST 
was detected between the two mitochondrial clades 
sampled in this study, that is, clade c (three individuals) and 
clade d (two individuals. Table 1). The estimated average 
Hn/hs ratio in C. nigra was around 0.3. This is higher than 
the highest Hn/hs ratio reported so far (approximately 0.2 
in human). The average C nigra I C, carbonaria d^/ds ratio 



in this analysis was 0.13 to 0.15. This value is similar to 
the human-specific and chimpanzee-specific dN/ds ratios 
[6], but still substantially lower than the C. nigra n^/ns 
ratio. Consequently, the neutrality index was above unity 
(Figure 2), even after removal of low-frequency variants, 
and the proportion of adaptive amino-acid substitution 
estimated by the McDonald-Kreitman method, a and ao.2, 
was below zero in C. nigra. When a method accounting 
for the whole distribution of allele frequencies was used 
[29], an estimate of 0.13 was obtained, with a confidence 
interval including zero. The other turtle species for which 
we have within-species variation data, E. orbicularis, did 
not show such an extreme population genomic pattern: 
Hs and a were higher, and Hn/hs and d^/ds lower, in E. 
orbicularis than in C. nigra (Figure 2). These results are 
indicative of an extremely small long-term A/g in the 
Galapagos tortoise, and suggest that the elevated rate of 
non-synonymous substitutions observed in C. nigra is in 
the first place a consequence of increased genetic drift, 
not accelerated adaptive evolution. 

Flanking sequences 

To further investigate this hypothesis, we examined the 
evolution of potential targets of weak selection, namely 



Table 3 Robustness of non-synonymous and synonymous diversity estimates in the giant Galapagos tortoise 



Condition 


Ind. in) 


Paralogue filter 


Genes (n) 


SNPs in) 


ns (%) 


TTn/TTs 


F,T 


ao.2 


Al 


>3 


Stringent 


1,053 


1,933 


0.18 ±0.03 


0.35 ±0.06 


-0.14 


-0.30 


A2 


>4 


Stringent 


968 


1,538 


0.18 ±0.03 


0.34 ±0.06 


-0.12 


-0.34 


A3 


5 


Stringent 


814 


1,041 


0.19 ±0.04 


0.31 ±0.06 


-0.10 


-0.15 


B1 


>3 


Very stringent 


1,059 


1,423 


0.13 ±0.02 


0.33 ±0.05 


-0.11 


-0.05 


B2 


>4 


Very stringent 


974 


1,131 


0.14 ±0.02 


0.31 ±0.05 


-0.10 


-0.09 


B3 


5 


Very stringent 


820 


769 


0.14 ±0.02 


0.28 ±0.05 


-0.08 


0.18 


Emys 


>5 


Stringent 


953 


1,845 


0.42 ±0.04 


0.09 ±0.02 


0.21 


0.57 
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Figure 2 Distribution of population genomic statistics across predicted coding sequences in C nigra and E orbicularis, tji^/tis 
distribution: coding sequences for wliicli ns = 0 were omitted. Nl distribution: coding sequences for wliicli ds = 0 were omitted. 



coding region-flanking sequences. In C nigra, we extracted 
158 5' UTR and 598 3' UTR sequences of length above 
50 bp, and measured levels of genetic diversity based on 
sites genotyped in at least four individuals (out of five). In 
E. orbicularis, we similarly analyzed 89 5 ' UTR and 457 3 ' 
UTR sequences, requiring that at least eight individuals 
(out of 10) were genotyped. Figure 3 summarizes the 
relative levels of within-species diversity at 5 ' UTR, 3 ' 
UTR, synonymous and non-synonymous sites in the 



2.0 



□ Chelonoidis nigra 

□ Emys orbicularis 



7C5/7CS 



Figure 3 Flanlcing vs. coding region diversity in C nigra and E. 
orbicularis, ns: average nucleotide diversity at synonymous sites; ns: 
average nucleotide diversity at 5'-UTR sites; n^: average nucleotide 
diversity at 3'-UTR sites. Very similar results were obtained when we 
only analyzed predicted cDNAs for which both coding sequence 
and 3' UTR sequence were available, and when we restricted the 
analysis to the 500 first bases of the 3' UTR regions. 



two species. In E. orbicularis, the 5 ' and especially the 3 ' 
UTR sequences showed a significantly lower amount of 
diversity than synonymous sites, suggesting that these 
sequences are under selective pressure. The selective con- 
straint, although detectable, was much less pronounced 
than for non-synonymous sites, as reflected by the higher 
Hutr/tts than Hn/hs ratio, in line with similar observations 
made in mammals [12] and birds [30]. The strength of 
purifying selection on UTR sequences appeared weaker 
in C. nigra. No significant constraint was detected in 5 ' 
flanking sites, and the estimated level of constraint on 3 ' 
flanking sites was reduced, compared to E. orbicularis. 
This is consistent with the Hn/hs pattern across species, 
and with the hypothesis of a reduced A/g in the Galapagos 
tortoise. 

In neither of the two species did we detect a progressive 
decrease in the amount of selective constraint with 
increasing distance to the coding sequence, in contrast 
with the published mammal and bird analyses [12,30]. We 
suggest that the difference may result from the fact that 
we are here analyzing cDNA sequences, whereas the two 
above-cited studies analyzed genomic sequences. Conse- 
quently, our flanking regions only include UTR, whereas 
theirs included both UTR and untranscribed regions, the 
proportion of which presumably increases as ones moves 
away from the coding sequence. For this reason, Hflanidng/ 
Hs estimates should not be directly compared between 
turtles, mammals, and birds across the three studies. 

Gene expression analysis 

Gene-specific expression profiles were estimated in the 
five turtle species for which Illumina data were available. 
Correlation coefficients of log-transformed, normalized 
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expression levels across genes were calculated for each 
pair of species. The Testudinidae (C. nigral C, carbonaria) 
and Emydidae {E.orbicularisIT, scripta) pairs showed the 
highest level of correlation of gene expression (r2 = 0.78 
and r2 = 0.73, respectively). A neighbor-joining analysis 
of the pairwise correlation matrix produced the ((E. orbi- 
cularis, T. scripta), M. leprosa, (C. carbonaria, C. nigra)) 
topology, in agreement with published turtle phylogenies. 
Branch lengths revealed no specific pattern in the C. nigra 
lineage, which did not show an accelerated evolutionary 
rate of gene expression pattern. Rather, the M, leprosa 
lineage was the fastest evolving in this analysis (Additional 
file 2). 

Gene ontology analysis 

Functional annotation of the C. nigra and E, orbicularis 
predicted cDNAs was achieved by sequence similarity 
using the generic GO-slim 'Biological process' gene ontol- 
ogy. Roughly two-thirds of the 2,774 annotated coding se- 
quences (ORFs) were associated to metabolic genes, the 
other ones being associated to transport, regulation, cellular 
growth and organization, or immunity (Figure 4, Additional 
file 3: Table SI). For each GO-slim term, the average 
log-transformed n^/ns ratio was calculated and compared 
between the two species, in search for terms showing 
a specific increase or decrease in selective pressure in 
C. nigra. This was achieved through a z-score analysis 
accounting for sampling variance and global genomic 
trends (see Material and methods). 



Figure 5 plots the term-specific average Hn/hs ratio (left 
panel), and its normalized version (right panel, z-score), in 
C. nigra versus E, orbicularis, A majority of terms show a 
Hn/hs ratio lower than the genomic average, and therefore 
a negative z-score. This reflects the fact that coding se- 
quences for which a functional annotation was obtained 
are more conserved, on average, than the non-annotated 
ones. Colored circles show the terms for which we detected 
a significant contrast in n^/ns ratio between the two 
species, as reflected by the statistics (see Material 
and methods). Four terms showed a significantly posi- 
tive Az, that is, a higher n^/ns ratio in C. nigra than in 
E, orbicularis relative to genomic averages: 'RNA meta- 
bolic process; 'translation^ catabolic process! nucleotide 
metabolic process'. Two terms showed the opposite trend, 
that is, a lower Hn/hs ratio in C. nigra: 'immune system 
process' and 'response to stress' (Additional file 3: 
Table SI). 

Discussion 

Genome-wide studies have been so far largely restricted to 
model organisms, that is, domestic or laboratory species. 
Here, thanks to NGS technologies, we were able to 
characterize for the first time the transcriptomic variability 
and population genomics of an endangered, emblematic 
species, the giant Galapagos tortoise, in absence of prior 
genomic knowledge. 

One specificity of our sample is the lack of geographic 
information: we do not exactly know from which locality 
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Figure 4 Functional annotation of coding sequences analyzed in C nigra and E orbicularis, (a) Pie chart representing the total number of 
coding sequences from the two species in each category; (b) correlation of category-specific abundance between the two species. Only coding 
sequences for which we obtained both a GO-slim term annotation and a Hn/hs estimate were included. The generic categories used in this figure 
were defined by grouping terms from the 'biological process' GO-slim ontology as indicated in Additional file 3: Table SI, second colon. 
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and each island of the Galapagos archipelago the sampled 
individuals come from. This is a general problem with 
C. nigra: previous genetic studies have shown that the 
current location of a turtle is not a reliable indicator of 
its true origin because of human-made translocations 
[21,31]. For this reason, these studies concluded that a 
genetic assignment of lineage of origin is the most 
trustworthy tool to use. Our sample, which covers a 
wide range of C nigra mitochondrial and microsatellite 
lineages, is appropriate with respect to this criterion. 
Importantly, our analysis revealed no departure from 
panmixy in C nigra. This is the most favorable situation for 
studying population genomics and molecular evolution, 
in which any random sample of individuals is expected 
to provide an unbiased estimate of the species genetic 
diversity, irrespective of geography. We conclude that 
this potential concern regarding sampUng is unlikely to 
affect our conclusions, even though they would be worth 
confirming based on a larger population sample. 

Divergence times and substitution rates within the 
chelonoidis genus 

Our phylogenetic analysis revealed a significant heterogen- 
eity in d^ds ratio across branches, with C. nigra showing 
the highest d^/ds ratio of all the analyzed turtle lineages. 
This is consistent with the hypothesis of a reduced popu- 
lation size in the giant Galapagos tortoise. To what extent 
the high d^/ds ratio in C. nigra is explained by its insular- 
ity is unclear, however, and dependent on the timing of 
divergence within the Chelonoidis genus and colonization 
of the Galapagos archipelago. The literature is contradictory 
regarding the divergence date between C. nigra and C. 
carbonaria. In a phylogenetic analysis of five genes in 



32 Testudinoidea species and one fossil calibration, Le 
and McCord [28] estimated that the C. nigra I C, carbonaria 
split occurred 29 +/- 7 million years ago. However, from 
an analysis of mitochondrial DNA within the Chelonoidis 
genus and a biogeographic calibration, it was suggested 
that C. nigra diverged from its closest relative C. chilensis 
as early as 3.2 million years ago [19]. Propagating this 
estimate to the C. nigral C carbonaria pair based on 
their data yields an estimated age of approximately 5 
million years ago for this split, that is, six times younger 
than the Le and McCord figure. 

Our analysis of 248 nuclear genes indicates that the 
average amount of synonymous divergence between C. 
nigra and C. carbonaria is 0.052 substitutions per syn- 
onymous site (Additional file 2). Assuming a divergence 
as recent as approximately 5 million years for this 
split would imply a synonymous substitution rate of 
approximately 5.10"^ substitution per site per million years 
in Chelonoidis, This is well outside the range of nuclear 
synonymous substitution rate recently estimated across 
132 turtle species (three nuclear genes) [27], and 20 times 
as high as the median rate in this study. Such a dramatic 
increase in substitution rate would be expected to 
substantially lengthen Chelonoidis branches in molecular 
phylogenies - an effect that was not conspicuous in ref- 
erences [26] and [28]. In contrast, the 29 million years 
estimate of Le and McCord, which was used in this 
study, would imply a plausible synonymous rate estimate 
of 9.10"^ substitution per site per million years in the 
Chelonoidis genus. One reason for the unexpectedly 
recent divergence times inferred by Poulakakis et al. [19] 
is their assumption that the C. nigral C chilensis split oc- 
curred at the time of the emergence of the first Galapagos 
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Islands, that is, 22 to 4 million years ago. This does not 
need to be necessarily true: the split might have predated 
the emergence of the archipelago, assuming that the 
ancestral species that invaded the islands some Mya has 
gone extinct in the continent since then [18]. According 
to this scenario, only about 10% of the C. nigra branch in 
the tree of Figure 1 would be associated with insularity, 
which perhaps explains the modest d^/ds increase in 
this lineage compared to, for example, the C carbonaria 
branch. 

A typical low-Ne species 

The average synonymous diversity in C. nigra was very low 
(hs = 0.0013 to 0.0019), much lower than in the similarly 
analyzed European pond turtle E, orbicularis. Among ver- 
tebrates, levels of synonymous diversity below our C. nigra 
estimate have only been reported in the common marmo- 
set Callithrix jacchus (hs = 0.0012), and Homo sapiens 
(ns = 0.0012) [6]. Noticeably, this top-three list of low- 
diversity champions includes very diverse species in terms 
of current abundance - the geographically restricted 
Galapagos tortoise, the relatively abundant, locally invasive 
common marmoset, and the highly successful humans. 

Besides its low genome-average level of genetic poly- 
morphism, we found an extremely high ratio of non- 
synonymous to synonymous diversity in C. nigra. Our 
Hn/hs estimate (0.28 to 0.32) is the highest value reported 
so far from genome-wide data, suggesting that purifying 
selection against mildly deleterious non-synonymous vari- 
ants is less effective in C. nigra than in most living species. 
As compared to published values, our estimate might be 
slightly inflated by the absence of a reference genome in C. 
nigra^ and the de novo prediction of ORF we performed. 
However, we note that a similar analysis performed in 
E, orbicularis did not yield such an elevated Hn/hs ratio. 
Divergence analysis consistently revealed a high d^ld^^ 
ratio in C. nigra,, similar to the one reported in human 
(Figure 1, Table 3). The analysis of flanking regions also 
revealed a reduced efficiency of purifying selection as 
compared to E, orbicularis. 

Therefore, for all the population genomic indicators we 
considered, C. nigra showed the expected characteristics of 
a low-population sized species, that is, a reduced amount 
of genetic diversity and a weakened efficiency of selective 
effects. This is consistent with the idea that C. nigra has 
experienced a reduced long-term A^e as a consequence 
of its isolation in a restricted geographic area [14]. This 
result, if confirmed from other case studies, would make 
island endemic species a promising model for the analysis 
of the A/e effect on genome evolution. Besides the variable 
we examined in this study, one may think to investigate, for 
example, the complexity of the transcriptome, proteome 
and interactome in island vs, mainland species, to test the 
suggestion that these features primarily evolve through 



the long-term accumulation of neutral or slightly dele- 
terious elements and interactions [9]. 

From a conservation point of view, the report of a small 
long-term and of elevated Hn/hs and d^lds ratio 
suggests that the giant Galapagos tortoise suffers from 
a particularly heavy load of deleterious mutations, both 
fixed and polymorphic, as compared to the other turtles 
of this study. This might be a matter of worry. However, 
our own species, H, sapiens, is similarly affected by a 
substantial load of deleterious mutations [32], which 
have apparently not hampered its ecological success. 
Obviously, the giant Galapagos tortoise deserves to be 
protected irrespective of its level of genetic diversity and 
mutation load. 

Our data did not reveal any departure from panmixy 
in C. nigra. No nuclear population structure between the 
two sampled mitochondrial lineages was found, and no 
significant excess of individual homozygosity was detected. 
This result is in apparent contradiction with the report 
of significant amounts of genetic differentiation between 
numerous pairs of C. nigra populations, based on a much 
larger sample of tortoises and 10 microsatellite loci [33]. 
The two studies are very different in terms of locus 
sampling, population sampling, and data type, and more 
data and analyses would be required to identify the rea- 
sons for this discrepancy. We note that the E, orbicularis 
analysis revealed a substantial excess of homozygosity and 
significant Fst between populations, suggesting that our 
approach has some power to detect population differ- 
entiation when effective. At any rate, the lack of genetic 
differentiation genome-wide that we report between 
entities that recently reached species level [34] calls for 
a re-examination of the taxonomy in this group. 

Functional population transcriptomics 

The McDonald-Kreitman approach did not reveal any 
evidence for adaptive amino-acid substitution in the C. 
nigra lineage. The Hn/hs ratio was higher than the d^lds 
ratio, and this was stfll true after low-frequency variants 
were accounted for. This negative result, however, might 
be explained by a recent drop of A/e in C. nigra, which 
would have inflated the Hn/hs ratio to a value well above its 
average level during the divergence between C. carbonaria 
and C. nigra, A global analysis of gene expression level 
did not reveal any specific pattern in the C. nigra lineage, 
in which the global rate of gene expression evolution 
did not appear to be accelerated, as compared to other 
turtle species. We note that these results about gene 
expression levels should be taken with caution because 
the physiological state of the sampled individuals was 
not properly controlled, and probably very different be- 
tween species - for instance, £. orbicularis individuals were 
caught from the wild, whereas C. nigra individuals were 
sampled in zoos. 
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The GO-term specific Hn/tts analysis we conducted 
identified four terms for which the Hn/hs ratio is signifi- 
cantly higher in C. nigra than in E. orbicularis relative to 
the genomic average, that is, evidence for relaxed selective 
pressure in the Galapagos tortoise (Figure 5, Additional 
file 3: Table SI). These terms correspond to basic metabolic 
functions of the cell, including 'translation' and 'RNA 
metabolic process'. We note that RNA translation is 
amongst the most energy-consuming cellular functions. 
The increased amount of functional constraint on these 
genes in E, orbicularis might be related to the necessity 
of substantial energetic savings in this temperate species, 
which enters into a deep hypometabolic state during 
wintering [35,36]. 

Two GO-slim terms, on the other hand, yielded evidence 
for reduced relative Hn/hs ratio in C. nigra, namely 
Immune system process' and response to stress'. Immune 
system process' is the only GO-slim term for which the C. 
nigra average (0.22) is below the E, orbicularis one (0.36). 
A high non-synonymous to synonymous ratio in immun- 
ity genes is typical and interpreted as reflecting the effect 
of balancing selection, that is, a selective force favoring 
the diversity of pathogen-responding alleles. Observing a 
relatively low n^/n value for immunity genes in C. nigra is 
therefore suggestive of a reduced pathogenic/parasitic di- 
versity in this insular species. This is in agreement with 
the hypothesis of a shift from antibody-mediated, acquired 
immunity to cell-mediated, innate immunity in insular 
species [37]. 

'Response to stress' is the other GO-slim term for which 
a significant reduction in Hn/hs ratio was detected in 
C nigra, most likely reflecting an increased selective 
constraint on these genes in the giant Galapagos tortoise. 
Seven of the 31C. nigra coding sequences associated to 
this category have homologues that encode for heat-shock 
proteins (for example, dnaj), that is, chaperone proteins 
involved in the response to various kinds of environmental 
stress, such as temperature, infection, and starvation, 
among others (Additional file 4: Table S2). This might be 
related to the highly fluctuating climate of the Galapagos 
Islands, and the long, recurrent periods of aridity that 
animals must face [38]. Besides heat-shock proteins, a 
majority of the C. nigra coding sequences associated to 
the response to stress' term have homologues related to 
the control of oxidative stress (for example, per oxydase), 
DNA replication/repair (for example, photolyase), and 
apoptosis (for example, ^c/2-associated transcription 
factor. Additional file 4: Table S2), that is, functions that 
have been linked to the regulation of ageing and longevity 
[39]. Our results suggest that the giant Galapagos tortoises, 
which can live well above 100 years in a warm, mutagenic 
environment, are undergoing a particularly strong selective 
pressure with respect to the management of oxidative 
molecular species and DNA damage. 



Conclusions 

Our analyses of transcriptomic diversity in C. nigra revealed 
that molecular evolution in the giant Galapagos tortoise 
is strongly influenced by an increased rate of genetic 
drift, which weakens the efficiency of natural selection 
and creates a substantial mutation load. The relaxation of 
purif)dng selection affects most genes, regulatory regions, 
and functions, with the exception of genes involved in 
response to stress, on which selective pressure has been 
reinforced, presumably in response to the new environ- 
mental conditions and elongated life span in this species. 
This study points to insular species as a promising model 
to explore further the effect of genetic drift on genomic 
patterns, and its relationship with gene function and 
adaptation. 

Material and methods 

Sampling and sequencing 

Blood from five adult C. nigra, one C. carbonaria and one 
M, leprosa individuals were sampled from public European 
zoological collections (Table 1). The sampling and animal 
handling have been done by veterinarians and staff of 
the Montpellier Zoo, Zurich Zoo, Rotterdam Zoo, and 
A Cupulatta Zoo according to the Code of Practice and 
Code of Ethics established by the European Association 
of Zoos and Aquarias. RNA was extracted and cDNA 
libraries were sequenced using Genome Analyzer II 
(Illumina®, see Additional file 2). Sequence reads were 
deposited in the NCBI Sequence Read Archive data- 
base as bioproject PRJNA230239, biosample accession 
SRS509366-SRS509372 (see Table 1). 

Transcriptome assembly and genotype calling 

Contigs (predicted cDNAs) were assembled following 
reference [40]. Illumina reads were mapped to the contigs 
using BWA [41]. In C. nigra, single-nucleotide polymor- 
phisms (SNPs) and genotypes were called using the method 
described in reference [42]. Dubious SNPs potentially 
resulting from hidden paralogy were cleaned using the 
method introduced in reference [22]. This method cal- 
culates the likelihood under a two-locus model (that is, 
assuming that two distinct genes have contributed reads 
which were erroneously assembled in a single contig), and 
discard SNPs that reveal a significant increase in likelihood, 
as compared to the one-locus model. Two stringency levels 
were tried in this analysis: stringent (likelihood ratio 
test P value <0.01) and very stringent (likelihood ratio test 
P value <0.05). 

Polymorphism analysis 

For each ORE, the following statistics were calculated: 
per-site synonymous (hs) and non-synonymous (hn) diver- 
sity in C. nigra, per-individual heterozygosity, overall het- 
erozygote deficiency (Fix), number of synonymous (ps) ^nd 
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non-synonymous (p^) segregating sites in C. nigra, num- 
ber of synonymous {ds) and non-synonymous {ds) fixed 
differences between C. nigra and C. carbonaria, neutrality 
index NI = (p^lps)l{d^lds)y neutrality index calculated 
after removing SNPs for which the minor allele frequency 
was below 0.2 (NIo.2)> estimated fraction of adaptive 
amino-acid substitutions a = 1-NI, and its corrected 
version ao.2 = I-NI0.2. An additional estimate of a, called 
^Ewio was calculated according to a method based on the 
inferred site frequency spectrum [29], that is, the distri- 
bution of minor allele frequency across SNPs. A similar 
analysis was conducted in the European pond turtle E, 
orbicularis based on a sample of 10 individuals, using 
the pond slider T, scripta as outgroup, reproducing a 
previously published analysis [22]. In both C. nigra and 
E, orbicularis, we extracted 5' and 3' UTR sequences 
from contigs in which a start and/or a stop codon had 
been recovered. UTR sequences of length 50 bp or more 
available in at least four (C. nigra) or eight (£. orbicularis) 
individuals were selected. UTR sequence polymorphism 
was estimated using the nucleotide diversity index, sep- 
arately averaged across 5 ' UTR and 3 ' UTR sequences. 

Gene ontology analysis 

The generic GO-slim ontology [23] was used to explore the 
distribution of the Hn/hs ratio across ftinctional categories 
of genes in C nigra vs. E, orbicularis, in search for gene sets 
showing a specific increase/decrease of selective pressure 
in one of the two species. GO-slim is a contracted version 
of the Gene Ontology database including a relatively 
small number of high-level, little-overlapping terms. A 
term-averaged log-transformed Hn/hs ratio was calculated 
for each species and each term, and compared between 
species. A resampling procedure was designed to quantify, 
for each term, the significance of the normalized difference 
in Hn/hs ratio between the two species (see Additional 
file 2). 
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